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I .  BACKGROUND 


Traditionally,  weapon  system  cost  estimates  have  been  pre¬ 
pared  using  Industrial  Engineering  (I.E.)  techniques.  These  techniques 
involved  detailed  studies  of  the  operations  and  materials  required 
to  produce  the  new  system.  The  cost  estimate  frequently  required 
several  thousand  hours  to  produce  with  volumes  of  supporting 
documentation.  Changes  in  design  require  extensive  changes  in 
these  estimates.  In  spite  of  all  the  time  and  effort  involved 
in  preparing  these  estimates,  their  accuracy  leaves  much  to  be 
desired.  This  is  evidenced  by  the  large  cost  overuns  cited  by 
the  annual  General  Accounting  Office  (GAC)  reports  to  Congress. 

In  1972,  for  example,  the  GAO  reported  that  the  Navy  had  experi¬ 
enced  a  cost  growth  of  319  billion  on  24  weapon  systems  in  FY  19"! . 
Approximately  15*  of  this  cost  growth  was  attributed  to  poor 
initial  cost  estimates  for  the  weapon  systems.  The  report  went 
on  to  make  the  following  recommendation: 

"Develop  and  implement  DOD  wide  guidance  for  consistent 
and  effective  cost  estimating  procedures  and  practices 
particularly  with  regard  to,  ...  an  effective  indepen¬ 
dent  review  of  cost  estimates." 

Three  months  prior  to  the  GAG  recommendation,  Deputy 
Secretary  of  Defense  David  Packard  suggested  the  use  of  Indepen¬ 
dent  Parametric  Cost  Estimation,  (IPCE),  as  a  possible  solution  - - 

to  poor  initial  cost  estimates.  In  a  memorandum  dated  °n  ® 

k  if,-1  inn  n 


Parts  of  this  section  are  nearly  verbatim  extractions 
:rom  sections  I  and  II  of  reference  (10). 
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December  7,  1971  to  the  Service  Secretaries,  Mr.  Packard  stated: 


"Parametric  cost  estimates  available  in  1964  on  the 
F-111A  and  in  1965  on  the  C-5A  came  within  20  percent 
of  the  actual  costs  currently  being  experienced." 

Mr.  Packard  then  directed  each  of  the  Service  Secretaries 

to : 

1.  Improve  their  capability  to  perform  independent 
parametric  cost  analysis. 

2.  Have  such  analyses  done  on  each  major  weapon  system 
at  each  key  decision  point  in  the  weapon  system 
acquisition  process. 

5.  Make  the  analysis  available  directly  to  the  Defense 
System  Acquisition  Review  Council  (DSARC)  at  each 
DSARC  review  starting  January  1,  1972. 

Secretary  of  Defense  Melvin  Laird,  on  January  23,  19"2, 

issued  a  memorandum  supp^  "ing  the  Packard  memo  and  established 

a  high  level  DOD  organization  (CAIG:  Cost  Analysis  Improvement 

Group)  to  review  IPCE's  and  report  on  their  soundness  to  the 

DSARC . 

3ecause  of  the  rigidity  and  poor  performance  of  the  traditional 

approach,  some  early  successes  using  the  parametric  approach, 

and  the  cited  high  level  directives,  independent  parametric  cost 

estimation  has  been  receiving  considerable  attention  in  the 

Department  of  Defense  as  a  means  of  increasing  the  accuracy  of 

cost  estimates.  This  procedure  is  based  c.n  the  premise  that  the 

cost  cf  a  weapon  system  is  related  in  a  quantifiable  way  to  the 

*> 

system's  physical  and  performance  characteristics." 


•> 

"These  characteristics  are  referred  to  as  system  "parameters" 
in  the  cost  estimation  literature  and  should  not  be  confused  with 
statistical  parameters  (e.g.,  standard  deviations,  regression 
coef f icients ,  etc.).  Stat is ticians  would  refer  to  system  "para¬ 
meters"  as  predictor  (independent)  variables  and  "cost"  as  the 
criterion  (dependent)  'ariabie.  They  would  also  refer  to  the  goal 
as  cost  prediction  instead  of  estimation. 


as  : 


A  Parametric  Cost  Estimate  has  been  defined  by  Baker  (1) 


"An  estimate  which  predicts  cost  by  means  of  explanatory 
variables  such  as  performance  characteristics,  physical 
characteristics,  and  characteristics  relevant  to  the 
development  process,  as  derived  from  experience  on 
logically  related  systems." 

The  construction  and  use  of  cost  estimating  relationships, 
(CER) ,  forms  the  foundation  for  making  IPCE's.  Cost  estimating 
relationships  are  mathematical  equations  which  relate  system 
costs  to  various  explanatory  variables.  They  are  most  generally 
derived  through  statistical  regression  analysis  of  historical  cost 
data.  These  techniques  are  described  in  (9).  Some  examples  of 
their  use  appear  in  (  2  )  ,  (  4  ) ,  (  5  ) ,  and  (11) • 

The  parametric  approach  has  some  distinct  advantages  and 
disadvantages  compared  to  I.E.  methodology.  On  the  plus  side  are: 

1.  Parametric  cost  estimates  can  be  developed  during 
the  concept  formulation  stage  of  the  acquisition 
process  before  detailed  engineering  plans  are 
available.  These  early  cost  estimates  can  be 
used  to: 

(a)  Identify  possible  cost/performance  tradeoffs 
in  the  design  effort. 

(b)  Provide  a  basis  for  cost/effectiveness  review 
of  performance  specifications. 

(c)  Provide  information  useful  in  the  ranking  of 
competing  alternatives. 

(d)  Suggest  a  need  for  identifying  and  considering 
new  alternatives. 
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2.  Historical  cost  data  incorporates  system  development 
setbacks  such  as  engineering  and  design  specification 
changes  and  other  items  that  are  not  identifiable  at 
the  time  of  design.  Industrial  engineering  (I.E.) 
estimates  tend  to  be  optimistic  in  that  they  don't 
allow  for  unforeseen  problems.  Unexpected  engineering 
or  design  changes  usually  bring  about  unexpected  in¬ 
creases  in  system  cost.  Cost  estimating  relationships 
based  on  historical  data  will  incorporate  some  of 
these  unknowns  into  the  cost  estimate. 

Possible  problems  with  the  parametric  approach  include: 

1.  Unlike  the  IE  approach,  many  subjective  assessments 
and  decisions  must  be  made  by  the  analyst  including: 

(a)  Selecting  the  "analogous"  systems  to  include 
in  the  historical  data  base. 

(b)  Selecting  the  form  of  the  prediction  equation. 

(c)  Selecting  an  appropriate  subset  of  performance/ 
design  characte ris t ics  to  include  in  the  final 
prediction  equation. 

_  • 

These  decisions  can  lead  to  conflicting  estimates  by  dif¬ 
ferent  analysts  even  when  sound  statistical  practices  are 
employed.  There  is  no  universally  "best"  approach  to  the 
selection  problems  stated  above.  Subjectivity  cannot  be 
avoided  but  can  be  incorporated  in  a  consistent  manner  using 
the  Bayesian  approach  to  prediction  as  discussed  by  Lindley  (6 

2.  Historical  data  sets  are  often  characterised  by 
sample  sizes  which  are  relatively  small  compared 
to  the  number  of  potential  predictor  variables. 

This  often  leads  to  an  overstatement  concerning 
the  degree  of  fit  supposedly  obtained.  Discussion 
of  this  problem  can  be  found  in  (31  and  (12). 

The  phrase  "logically  related  system"  in  the  cited  defi.niti 
of  parametric  cost  estimation  is  subject  to  all  kinds  of  inter¬ 
pretation  and  degrees  of  relation.  Certainly  there  is  no  his¬ 
torical  system  identical  in  all  respects  to  the  object  system 
(the  system  whose  cost  we  wish  to  predict)  else  the  problem 
would  not  exist.  At  the  other  extreme,  all  military  systems 
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are  "logically  related"  in  (at  least)  the  sense  that  they  are 
military  systems.  Message  carrying  pidgeons,  air-to-air  missiles, 
jet  aircraft  and  fricbees  are  "logically  related"  in  that  they 
all  fly.  Obviously,  the  analyst  must  take  into  account  the 
degree  of  analogy  between  each  system  (which  is  a  candidate  for 
the  historical  data  base)  and  the  objective  system.  Analogy, 
according  to  Webster,  is  "a  partial  similarity  between  like 
features  of  two  things  on  which  a  comparison  may  be  based."  How 
does  one  measure  the  degree  of  analogy  between  "logically  related” 
systems  and  how  can  one  exploit  these  partial  similarities  in 
predicting  the  cost  of  an  objective  system? 

In  what  follows,  we  propose  Mahalanobis  distance  as  a 
measure  of  analogy  and  discuss  its  implication  in  the  processes 
of  selecting  (potential)  members  of  the  data  base  and  tailoring 
a  CER  to  a  specific  objective  system.  This  is  a  distinct 
departure  from  standard  procedures  recommended  (9  ),  (10)  and  used 
in  developing  every  CER  with  which  the  writer  is  familiar.  The 
distinction  is  fundamental  and  goes  beyond  measures  of  analogy. 

The  standard  approach  appears  more  oriented  toward  developing 
a  cost  explaining  equation  relating  costs  of  a  class  (e.g.,  sonars, 


airframes 

,  tanks  , 

,  etc 

. )  of  his 

torical  syst 

ems  to  the  charac- 

teristics 

of  thos 
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stems.  One  need  not 

have  any  specific 
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system 

in  mind  while 

developing 

such  a  general  purpose 

descriptive  equation. 

In  fact 

,  armed  with 

an  airframe  CER 
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the  explanat 

orv  variable  "weight” 

,  two  radically  dif- 

ferent  ai 
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of  a  regression  equation  which  include  id'  pure  description 
and  (b)  prediction.  Lindlev  (6  )  emphasizes  that  the  technique 
used  to  develop  a  regression  equation  ought  to  be  related  to 
the  intended  use.  In  the  present  context,  the  intended  use  is 
prediction  of  the  cost  of  a  specific  system  so  that  using  a 
CER  (which  was  developed  to  describe  historical  relations  with¬ 
out  reference  to  any  sp eci f i c  objective  system)  to  predict  cost 
(of  a  specific  system)  is  contrary  to  Lindlev's  recommendation 
and  common  sense. 
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II.  MEASURES  OF  ANALOGY 

Having  gathered  and  adjusted  historical  data  on  systems 
judged  more -or- less  analogous  to  a  proposed  system  whose  cost 
is  to  be  estimated,  the  analyst  proceeds  with  the  task  of  devel¬ 
oping  a  "best"  Cost  Estimation  Relation  (CER) .  This  involves 
selecting  the  form  of  the  CER,  deciding  which  of  the  system 
variables  (performance  characteristics,  design  specifications,  etc.) 
to  include  as  predictor  variables,  and  assessing  the  precision  of 
the  estimate.  In  parametric  cost  estimation,  this  is  usually  done 
through  the  use  of  multiple  regression  and  some  standard  variable 
selection  criterion  such  as  maximizing  adjusted  R  squared 
(minimizing  mean  square  error  [MSE ] ) ,  maximizing  F,  using 
Mallow's  C  ,  etc. 

All  of  these  techniques  share  two  properties:  (1)  For  any 
fixed  number  of  variables  in  the  prediction  equation,  the  optimal 
set  of  variables  is  that  set  which  minimizes  the  MSE. 

(2)  They  all  ignore  the  values  of  the  variables  of  the  system 
whose  cost  is  being  estimated.  The  first  of  these  properties  is 
reasonable  but  myoptic  when  the  object  is  prediction.  The  second 
property  seems  contrary  to  common  sense. 

Suppose  there  are  n  systems  in  the  historical  data  base. 
Associated  with  the  itn  such  system  is  a  cost  and  values  of 
p  (candidate)  predictor  variables  , .  .  .  ,  .  .  .  ,X^  .  Let  Y 

denote  the  vector  of  costs  and  X^  the  vector  of  characteristic 
j  values : 


a 
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denote  the  sample  means  and  covariances.  Denoting  the  values  of 
the  proposed  system  by  lower  case  letters,  we  wish  to  predict  tne 
cost  y  by  exploiting  the  predictive  ability  of  the  characteristics 
x^ , . . . ,x . , . . . ,x  .  This  predictive  ability  is  inferred  from  the 
apparent  relation  between  historical  costs  and  characteristics  and 
the  degree  of  analogy  between  the  proposed  system  and  these  his¬ 
torical  data.  How  analogous  is  the  proposed  system  to  the  his¬ 
torical  data? 

(a)  Marginal  comparisons:  Analogy  on  a  single  dimension 
is  straightforward.  One  could  refer  x^  to  a  histogram  of  X^- 
values,  i  =  l,2,...,n  as  in 

Figure  1 
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A  statistic  commonly  used  as  a  nonnegative  distance  index  is 
simply  the  square  of  the  standardised  distance  between  and 
the  mean  of  the  X^'s,  namely 


where  s. 


J 

indicate 


x.-X-  2 

M.  =  (-i - =4 

J 

is  defined  as  (s..)^h 

jr 

a  low  degree  of  analogy. 

O  O  t 


Large  values  of  this  statistic 


(b)  High  dimensional  comparisons:  The  collection  of 
marginal  indices  ,M0 , . . . ,M  }  can  give  a  very  misleading 
impression  of  the  overall  degree  of  analogy.  Even  when  M-  is 
small  for  every  j,  the  proposed  system  can  be  terribly  nonanalogcus 
to  the  historical  data.  A  simple  bivariate  example  will  illustrate 
this  assertion.  Suppose  X^  and  X-,  denote  weight  and  maneuverability, 
respectively  and  that  and  x-,  are  each  within  one  standard  devi¬ 
ation  of  their  respective  means ,  i.e., 


Mj_  -  M,  £  1. 

Suppose  further  that,  historically,  heavy  systems  tended  to 

less  maneuverable  i.e.,  c  <  G,  but  the  orocosed  svstem 

xi_ 

little  heavier  and  slightly  more  maneuverable  than  the  aver 


b  e 


The  situation  is  depicted  in 


i 

i 

i 


X, 


s 
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We  see  that  (x^,x?)  is  marginally  analogous  to  the  historical  data 
on  both  weight  and  maneuverability  but  not  at  all  analogous  when 
viewed  in  two  dimensions.  Comparing  (x^,x7)  to  ( X  ^  ,  X  }  marginally 
ignores  important  relational  information. 

A  measure  of  analogy  which  incorporates  relational  information 
was  suggested  in  1930  by  P.C.  Mahalanobis  (  7.  He  proposed 


M 


hu(1> 


) 


as  a  measure  of  the  distance  between  two  multivariate  populations 
with  mean  vectors  and  ,  respectively,  and  common  covari¬ 

ance  matrix  5.  Replacing  the  parameter  values  by  estimates,  we 
obtain  (in  our  notation) 

M  =  (x-X) '  S*  2 (x-X) 

where  5  =  (5^).  Except  for  a  multiplicative  constant,  this 
is  Hotelling's  T~ statistic  used  to  test  that  x  and  the  historical 
data  came  from  the  same  population.  In  the  previous  bivariate 
example,  it  is  easy  to  show  that 


M 


“T  [M^ZoCM^l.)1'2  -  M:] 


which  can  be  arbitrarily  large  even  when  M,  and  M,  are  small. 
For  example,  with  =  M.,  =  s, 


M 


~>  - 


and  lim  M  =  70 . 
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III.  THE  ROLE  OF  ANALOGY  IN  PREDICTION 

As  mentioned  in  the  previous  section,  most  standard  variable 
selection  techniques  share  the  property  that,  for  any  given  number 
of  variables  in  the  regression  function,  the  optimal  set  is  that 
set  which  minimises  residual  mean  square  error  (or,  equivalently, 
maximises  r”)  .  The  objective  system  may  be  rather  nonanalogous 
to  the  historical  data  (large  .M)  when  we  consider  the  subset  of 
variables  identified  as  "optimal"  by  the  criteria  used  to 
develope  the  CER.  Often,  there  are  several  k-variable  models 
which  come  close  to  the  "optimum"  in  terms  of  r“  and  other  measure 
of  model  aptness  based  on  residual  analysis.  In  these  cases,  by 
using  a  slightly  suboptimal  set  of  prediction  variables  (slight 
decrease  in  r“)  it  may  be  possible  to  substantially  improve  the 
degree  of  analogy  (decrease  in  M) .  What  is  the  role  of  analogy 
in  prediction  and  how  can  one  evaluate  the  tradeoff  of  fit  for 
analogy? 

The  width  of  the  prediction  interval  at  the  point  corres¬ 
ponding  to  the  objective  system  is  a  numeraire  which  seems  like 
a  reasonable  basis  for  choosing  between  alternative  models. 

We  shall  consider  a  monotone  function  of  the  width  for  simplicity, 
namely,  the  square  of  the  half-width,  vie. 

W  =  F  *MSE*(M  +  — ) 

l-4;l,n-k-l  ,  .  n 

where  F  is  the  (l~4)t*1  fractile  of  an  F  distribution 

1-4,1, n-k-1 
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with  1  and  n-k-1  degrees  of  freedom.  This  measure  W  combines 
"fit”  (MSE)  and  "degree  of  analogy"  (M)  with  a  factor  F  which 
penalizes  for  using  too  many  variables  (increasing  k)  or  ex¬ 
cluding  points  from  the  data  base  (decreasing  n) .  In  this 

form,  the  role  of  analogy,  as  measured  by  Mahalanobis  distance, 

^  n+1 

is  evident.  It  enters  as  a  term  in  the  multiplier  (M  +  - ) 

n 

of  MSE.  Failure  to  consider  this  factor  in  selecting  a  CER 
could  have  a  marked  effect  on  predictor  precision  as  measured 
by  prediction  interval  width. 


IV.  SUMMARY 


y 
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We  have  pointed  out  the  difficulty  of  recognizing  the 
degree  of  analogy  in  high  dimensional  spaces.  We  have  suggested 
Mahalanobis  distance  as  a  measure  of  analog  and  pointed  to  its 
role  in  prediction  precision. 

Reference  (10)  suggests  a  14  step  precedure  for  developing 
a  parametric  cost  estimate.  The  importance  of  understanding  the 
system's  technical  aspects  is  stressed  in  steps  1-3  prior  to 
collecting  (step  4)  and  adjusting  (step  3)  the  data.  Somewhere 
prior  to  building  (step  8)  and  evaluating  (step  9)  the  CER,  we 
recommend  the  analyst  "let  the  data  speak  for  itself".  Included 
in  such  a  "data  exploration"  ought  tn  be  considerations  of 
multivariate  degrees  of  analogy.  Our  contention  is 

(1)  Important  and  subtle  relations  among  the  systems 
and  variables  may  be  overlooked  when  viewed  from 
a  purely  technical  approach. 

(2)  Mahalanobis  distance  is  an  appropriate  measure  of 
analogy  which  can  shed  light  on  these  relations. 

The  analyst  should  be  open-minded  (but  skeptical)  about  relations 
which  seem  to  be  suggested.  Let  the  data  suggest  whatever  it 
will.  Relations  cannot  be  viewed  with  a  critical  eye  if  they 
are  not  viewed  in  the  first  place.  If  what  the  data  seem  to 
be  saying  is  inconsistent  with  the  analyst's  technical  under¬ 
standing,  the  source  of  the  contradiction  deserves  close 


attention. 
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In  subsequent  papers  we  will  develop  algorithms  for  building 
models  based  on  minimizing  W  and  will  compare  models  so  obtained 
with  models  judged  optimal  by  other  criteria. 
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